The goal of this notebook is to validate the best model identified in the previous work. Here, we follow two different applications:
client_id for Beta that resembles Release. Our application is then querying the current Beta data (Version N+1) for this client_id, and then calculate the metrics we care about from the covariates we care about. This is our outcome.## Loading the training dataset
#load('~/ff-beta-release-matching/poc/matchIt/feature_selection.RData')
df_train_encoder <- read.csv("df_train_encoder.csv", header = T, sep = "\t", encoding="UTF-8")
df_validate_encoder <- read.csv("df_validate_encoder.csv", header = T, sep = "\t", encoding="UTF-8")
df_train_encoder$default_search_engine_missing <- 0| rows | columns | discrete_columns | continuous_columns | all_missing_columns | total_missing_values | complete_rows | total_observations | memory_usage |
|---|---|---|---|---|---|---|---|---|
| 302819 | 96 | 7 | 89 | 0 | 0 | 302819 | 29070624 | 176871312 |
| rows | columns | discrete_columns | continuous_columns | all_missing_columns | total_missing_values | complete_rows | total_observations | memory_usage |
|---|---|---|---|---|---|---|---|---|
| 328042 | 96 | 7 | 89 | 0 | 0 | 328042 | 31492032 | 190290232 |
# sampling for beta overrepresentation
build_df <- function(df_c, multiple){
df_beta <- df_c %>% filter(label_beta == 1)
df_rel <- df_c %>% filter(label_beta == 0)
n_beta <- nrow(df_beta)
df <- df_rel %>%
sample_n(size = round(n_beta / multiple)) %>%
rbind(df_beta)
return(df)
}
df_train_1x <- build_df(df_train_encoder, 1)
df_validate_1x <- build_df(df_validate_encoder, 1)Let’s check the class balance.
## [1] "Train (50% - 50%)"
## [1] "Validation (50% - 50%)"
In this application, we need to balance the two groups (Beta and Release) considering the other covariates (e.g., environment and performance metrics) and then look at the difference in user engagement metrics between the balanced Beta and Release for that version (N). The utility of this application is to inform us on how Beta is different concerning Release in user engagement, with all the other covariates being equal.
Setting the selected expirement from previous work.
covariates <- c('daily_num_sessions_started', 'daily_num_sessions_started_max', 'FX_PAGE_LOAD_MS_2_PARENT', 'memory_mb', 'num_active_days', 'num_addons', 'num_bookmarks', 'profile_age', 'session_length', 'session_length_max','TIME_TO_DOM_COMPLETE_MS','TIME_TO_DOM_CONTENT_LOADED_END_MS','TIME_TO_DOM_INTERACTIVE_MS','TIME_TO_LOAD_EVENT_END_MS','TIME_TO_NON_BLANK_PAINT_MS')
engagement <- c('active_hours','active_hours_max','uri_count','uri_count_max','search_count','search_count_max','num_pages','num_pages_max','daily_max_tabs','daily_max_tabs_max','daily_unique_domains','daily_unique_domains_max','daily_tabs_opened','daily_tabs_opened_max')The best model from previous work.
nn <- matchit(formula = generate_formula(covariates, label), df_train_1x, 'nearest', replace = TRUE)
df_matched <- match.data(nn)
df_matched$label <- mapvalues(df_matched$is_release, from=c(FALSE,TRUE), to=c('beta','release'))
print(summary(nn))##
## Call:
## matchit(formula = generate_formula(covariates, label), data = df_train_1x,
## method = "nearest", replace = TRUE)
##
## Summary of balance for all data:
## Means Treated Means Control SD Control
## distance 0.6236 0.3764 0.2021
## daily_num_sessions_started 2.8999 2.3689 2.7099
## daily_num_sessions_started_max 5.2789 4.2814 4.8046
## FX_PAGE_LOAD_MS_2_PARENT 3027.2395 3463.7084 1920.1256
## memory_mb 9437.1264 8965.1561 7925.6741
## num_active_days 5.5807 5.3462 2.2644
## num_addons 5.6489 7.8554 3.3349
## num_bookmarks 160.3317 242.4878 1292.6145
## profile_age 896.6628 893.7534 762.4791
## session_length 9.2420 12.2962 14.7435
## session_length_max 18.1651 22.7066 30.2272
## TIME_TO_DOM_COMPLETE_MS 3286.9842 4388.9143 4254.1363
## TIME_TO_DOM_CONTENT_LOADED_END_MS 2293.7636 2737.6381 2700.5353
## TIME_TO_DOM_INTERACTIVE_MS 1792.4702 2404.3932 2397.0177
## TIME_TO_LOAD_EVENT_END_MS 3011.2550 4126.9123 4007.0480
## TIME_TO_NON_BLANK_PAINT_MS 1442.6659 1833.6557 2128.4506
## Mean Diff eQQ Med eQQ Mean eQQ Max
## distance 0.2472 0.2578 0.2472 0.3180
## daily_num_sessions_started 0.5310 0.4250 0.5318 1.6250
## daily_num_sessions_started_max 0.9975 1.0000 0.9994 12.0000
## FX_PAGE_LOAD_MS_2_PARENT -436.4690 294.4388 436.5274 1248.9389
## memory_mb 471.9702 27.0000 520.5812 196252.0000
## num_active_days 0.2345 0.0000 0.2465 1.0000
## num_addons -2.2064 2.0000 2.2088 114.0000
## num_bookmarks -82.1561 1.0000 82.6463 21769.0000
## profile_age 2.9094 25.0000 25.5655 1384.0000
## session_length -3.0542 1.4295 3.0567 150.2233
## session_length_max -4.5415 2.6847 4.5523 935.7919
## TIME_TO_DOM_COMPLETE_MS -1101.9301 418.7669 1101.9620 15499.6121
## TIME_TO_DOM_CONTENT_LOADED_END_MS -443.8745 228.7603 443.9028 10881.3696
## TIME_TO_DOM_INTERACTIVE_MS -611.9231 242.9859 611.9817 22911.2000
## TIME_TO_LOAD_EVENT_END_MS -1115.6573 438.5717 1115.7014 16697.6975
## TIME_TO_NON_BLANK_PAINT_MS -390.9897 156.9658 391.0893 29396.6429
##
##
## Summary of balance for matched data:
## Means Treated Means Control SD Control
## distance 0.6236 0.6236 0.2010
## daily_num_sessions_started 2.8999 4.5150 5.3190
## daily_num_sessions_started_max 5.2789 8.3996 9.9855
## FX_PAGE_LOAD_MS_2_PARENT 3027.2395 2898.3402 1631.7936
## memory_mb 9437.1264 13929.8965 16819.5954
## num_active_days 5.5807 5.9742 2.0587
## num_addons 5.6489 6.2318 1.9625
## num_bookmarks 160.3317 543.9529 3192.8106
## profile_age 896.6628 925.0090 785.0875
## session_length 9.2420 8.3001 13.0964
## session_length_max 18.1651 19.1759 63.4434
## TIME_TO_DOM_COMPLETE_MS 3286.9842 3144.0420 3117.5267
## TIME_TO_DOM_CONTENT_LOADED_END_MS 2293.7636 2837.8774 3917.2605
## TIME_TO_DOM_INTERACTIVE_MS 1792.4702 1719.1141 1783.4556
## TIME_TO_LOAD_EVENT_END_MS 3011.2550 2798.3931 2634.1006
## TIME_TO_NON_BLANK_PAINT_MS 1442.6659 1396.3843 1694.5964
## Mean Diff eQQ Med eQQ Mean eQQ Max
## distance 0.0000 0.1678 0.1611 0.2054
## daily_num_sessions_started -1.6151 0.2143 0.2369 2.7500
## daily_num_sessions_started_max -3.1207 0.0000 0.4142 12.0000
## FX_PAGE_LOAD_MS_2_PARENT 128.8992 98.7674 203.3276 908.3304
## memory_mb -4492.7701 23.0000 613.4854 195904.0000
## num_active_days -0.3935 0.0000 0.1592 1.0000
## num_addons -0.5829 1.5000 1.4790 8.4000
## num_bookmarks -383.6211 2.0000 85.8880 21043.6250
## profile_age -28.3463 17.0000 23.3302 1484.0000
## session_length 0.9419 0.1518 1.4165 143.4417
## session_length_max -1.0108 0.3483 2.3806 969.3572
## TIME_TO_DOM_COMPLETE_MS 142.9422 147.9143 549.9064 6621.0697
## TIME_TO_DOM_CONTENT_LOADED_END_MS -544.1138 91.2539 265.8759 4801.4920
## TIME_TO_DOM_INTERACTIVE_MS 73.3560 91.9479 310.3343 4396.0455
## TIME_TO_LOAD_EVENT_END_MS 212.8619 144.2673 548.0156 6756.1084
## TIME_TO_NON_BLANK_PAINT_MS 46.2817 61.4867 197.9214 13681.3200
##
## Percent Balance Improvement:
## Mean Diff. eQQ Med eQQ Mean eQQ Max
## distance 100.0000 34.9070 34.8278 35.4074
## daily_num_sessions_started -204.1764 49.5798 55.4625 -69.2308
## daily_num_sessions_started_max -212.8466 100.0000 58.5504 0.0000
## FX_PAGE_LOAD_MS_2_PARENT 70.4677 66.4557 53.4216 27.2718
## memory_mb -851.9181 14.8148 -17.8462 0.1773
## num_active_days -67.8061 0.0000 35.4217 0.0000
## num_addons 73.5837 25.0000 33.0419 92.6316
## num_bookmarks -366.9418 -100.0000 -3.9224 3.3321
## profile_age -874.2978 32.0000 8.7436 -7.2254
## session_length 69.1599 89.3797 53.6585 4.5144
## session_length_max 77.7431 87.0253 47.7048 -3.5868
## TIME_TO_DOM_COMPLETE_MS 87.0280 64.6786 50.0975 57.2824
## TIME_TO_DOM_CONTENT_LOADED_END_MS -22.5828 60.1094 40.1049 55.8742
## TIME_TO_DOM_INTERACTIVE_MS 88.0122 62.1592 49.2903 80.8127
## TIME_TO_LOAD_EVENT_END_MS 80.9205 67.1052 50.8815 59.5387
## TIME_TO_NON_BLANK_PAINT_MS 88.1629 60.8280 49.3923 53.4596
##
## Sample sizes:
## Control Treated
## All 59627 59627
## Matched 21051 59627
## Unmatched 38576 0
## Discarded 0 0
table_match <- CreateTableOne(vars = covariates, strata = "label", data = df_matched, test = FALSE)
print(table_match, smd = TRUE)## Stratified by label
## beta
## n 21051
## daily_num_sessions_started (mean (SD)) 2.75 (3.17)
## daily_num_sessions_started_max (mean (SD)) 5.01 (5.67)
## FX_PAGE_LOAD_MS_2_PARENT (mean (SD)) 3229.24 (1805.07)
## memory_mb (mean (SD)) 9530.79 (9308.07)
## num_active_days (mean (SD)) 5.55 (2.19)
## num_addons (mean (SD)) 7.12 (2.55)
## num_bookmarks (mean (SD)) 244.46 (1473.54)
## profile_age (mean (SD)) 911.98 (777.07)
## session_length (mean (SD)) 10.63 (13.05)
## session_length_max (mean (SD)) 20.46 (31.85)
## TIME_TO_DOM_COMPLETE_MS (mean (SD)) 3836.39 (3693.24)
## TIME_TO_DOM_CONTENT_LOADED_END_MS (mean (SD)) 2558.47 (2672.28)
## TIME_TO_DOM_INTERACTIVE_MS (mean (SD)) 2101.36 (2058.07)
## TIME_TO_LOAD_EVENT_END_MS (mean (SD)) 3558.01 (3430.90)
## TIME_TO_NON_BLANK_PAINT_MS (mean (SD)) 1640.41 (1881.21)
## Stratified by label
## release SMD
## n 59627
## daily_num_sessions_started (mean (SD)) 2.90 (2.94) 0.050
## daily_num_sessions_started_max (mean (SD)) 5.28 (5.31) 0.049
## FX_PAGE_LOAD_MS_2_PARENT (mean (SD)) 3027.24 (1578.89) 0.119
## memory_mb (mean (SD)) 9437.13 (8683.71) 0.010
## num_active_days (mean (SD)) 5.58 (2.06) 0.015
## num_addons (mean (SD)) 5.65 (2.22) 0.615
## num_bookmarks (mean (SD)) 160.33 (661.20) 0.074
## profile_age (mean (SD)) 896.66 (771.74) 0.020
## session_length (mean (SD)) 9.24 (9.47) 0.122
## session_length_max (mean (SD)) 18.17 (19.46) 0.087
## TIME_TO_DOM_COMPLETE_MS (mean (SD)) 3286.98 (2685.73) 0.170
## TIME_TO_DOM_CONTENT_LOADED_END_MS (mean (SD)) 2293.76 (2234.97) 0.107
## TIME_TO_DOM_INTERACTIVE_MS (mean (SD)) 1792.47 (1488.20) 0.172
## TIME_TO_LOAD_EVENT_END_MS (mean (SD)) 3011.26 (2427.41) 0.184
## TIME_TO_NON_BLANK_PAINT_MS (mean (SD)) 1442.67 (1380.44) 0.120
num_addons). However, for adjusted cases (post-matching), the standardized mean difference is smaller. That is, for most cases, the absolute value is even smaller than the threshold (\(0.1\))stats_post_mean <- calc_means(df_matched, engagement) %>%
dplyr::select(-label) %>%
set_rownames(c('beta (mean)', 'release (mean)'))
stats_post_median <- calc_medians(df_matched, engagement) %>%
dplyr::select(-label) %>%
set_rownames(c('beta (median)', 'release (median)'))
stats_post <- calc_delta(df_matched, engagement)
stats_mean <- stats_post_mean %>%
rbind(stats_post[1, ]) %>%
set_rownames(c('beta (mean)', 'release (mean)', 'delta (mean)'))
stats_median <- stats_post_median %>%
rbind(stats_post[2, ]) %>%
set_rownames(c('beta (median)', 'release (median)', 'delta (median)'))
stats <- stats_mean %>%
rbind(stats_median)
knitr::kable(stats) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
row_spec(c(3,6), bold = T, color = "white", background = "#4D5686") %>%
scroll_box(width = "100%")| active_hours | active_hours_max | uri_count | uri_count_max | search_count | search_count_max | num_pages | num_pages_max | daily_max_tabs | daily_max_tabs_max | daily_unique_domains | daily_unique_domains_max | daily_tabs_opened | daily_tabs_opened_max | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| beta (mean) | 0.7959226 | 1.5377521 | 150.2729335 | 308.9288395 | 2.3187671 | 5.4231153 | 1.632206e+04 | 1.650547e+04 | 8.3707078 | 12.4062515 | 4.7555407 | 8.2166331 | 18.4296738 | 36.0778110 |
| release (mean) | 0.8447582 | 1.6203453 | 156.1124914 | 319.3721301 | 2.3742188 | 5.4467272 | 1.738034e+04 | 1.757009e+04 | 6.1059912 | 9.2342731 | 4.9637533 | 8.5511265 | 16.9665119 | 33.0464555 |
| delta (mean) | 0.0578102 | 0.0509726 | 0.0374061 | 0.0326994 | 0.0233558 | 0.0043351 | 6.088980e-02 | 6.059230e-02 | 0.3709007 | 0.3435006 | 0.0419466 | 0.0391169 | 0.0862382 | 0.0917301 |
| beta (median) | 0.5187500 | 1.0513889 | 86.4000000 | 174.0000000 | 0.8000000 | 2.0000000 | 3.869700e+03 | 4.032000e+03 | 3.8333333 | 6.0000000 | 3.3571429 | 5.0000000 | 8.2857143 | 16.0000000 |
| release (median) | 0.5743056 | 1.1527778 | 97.2500000 | 196.0000000 | 0.8750000 | 2.0000000 | 5.571333e+03 | 5.746000e+03 | 3.7142857 | 6.0000000 | 3.6000000 | 6.0000000 | 8.8333333 | 17.0000000 |
| delta (median) | 0.0967352 | 0.0879518 | 0.1115681 | 0.1122449 | 0.0857143 | 0.0000000 | 3.054266e-01 | 2.982945e-01 | 0.0320513 | 0.0000000 | 0.0674603 | 0.1666667 | 0.0619946 | 0.0588235 |
| metric | label | active_hours | active_hours_max | uri_count | uri_count_max | search_count | search_count_max | num_pages | num_pages_max | daily_max_tabs | daily_max_tabs_max | daily_unique_domains | daily_unique_domains_max | daily_tabs_opened | daily_tabs_opened_max |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mean | beta | 0.8236611 | 1.577508 | 152.74550 | 311.0213 | 2.4506498 | 5.636171 | 17363.463 | 17558.93 | 9.603628 | 13.811495 | 5.060464 | 8.743610 | 20.491908 | 39.64786 |
| mean | beta - matched | 0.7959226 | 1.537752 | 150.27293 | 308.9288 | 2.3187671 | 5.423115 | 16322.057 | 16505.47 | 8.370708 | 12.406251 | 4.755541 | 8.216633 | 18.429674 | 36.07781 |
| mean | release | 0.8447582 | 1.620345 | 156.11249 | 319.3721 | 2.3742188 | 5.446727 | 17380.343 | 17570.09 | 6.105991 | 9.234273 | 4.963753 | 8.551127 | 16.966512 | 33.04646 |
| median | beta | 0.5309524 | 1.063889 | 86.66667 | 172.0000 | 0.8333333 | 2.000000 | 4185.667 | 4340.00 | 4.250000 | 6.000000 | 3.562500 | 5.500000 | 9.000000 | 17.00000 |
| median | beta - matched | 0.5187500 | 1.051389 | 86.40000 | 174.0000 | 0.8000000 | 2.000000 | 3869.700 | 4032.00 | 3.833333 | 6.000000 | 3.357143 | 5.000000 | 8.285714 | 16.00000 |
| median | release | 0.5743056 | 1.152778 | 97.25000 | 196.0000 | 0.8750000 | 2.000000 | 5571.333 | 5746.00 | 3.714286 | 6.000000 | 3.600000 | 6.000000 | 8.833333 | 17.00000 |
We can use the Kolmogorov-Smirnov test (KS) to verify the differences between the balanced Beta and Release for the v67 version. It is between 0 and 1, and represents how two data sets are similar. Smaller KS distance values indicate better balance.
par(mfrow = c(4, 2))
df_beta <- df_matched %>% filter(label == 'beta')
df_rel <- df_matched %>% filter(label == 'release')
output = data.frame()
for (i in engagement) {
# Training
x_t <- df_beta[,i]
y_t <- df_rel[,i]
rg_t <- range(x_t, y_t, na.rm=T)
ks <- ks.test(x_t, y_t)$statistic
output = rbind(output, data.frame(KS = ks))
}| KS | |
|---|---|
| active_hours | 0.0505550 |
| active_hours_max | 0.0457469 |
| uri_count | 0.0510276 |
| uri_count_max | 0.0522226 |
| search_count | 0.0179955 |
| search_count_max | 0.0193284 |
| num_pages | 0.0673132 |
| num_pages_max | 0.0662038 |
| daily_max_tabs | 0.0466099 |
| daily_max_tabs_max | 0.0434794 |
| daily_unique_domains | 0.0472977 |
| daily_unique_domains_max | 0.0476850 |
| daily_tabs_opened | 0.0331167 |
| daily_tabs_opened_max | 0.0378840 |
Here, we display density plots for the two groups on the given user engagement metric, so we can visually compare their distribution. The degree to which the densities for the two groups overlap is a good measure of group balance on the given covariate; significant differences in shape can be indicative of poor balance, even when the mean differences and variance ratios are well within thresholds.
The following violin plots depicts distributions for the following subsets:
NOTE: Guiding lines have been added for the following:
plots <- list()
for (covariate in engagement) {
stats_rel <- stats %>% filter(label == 'release') %>% dplyr::select(covariate, metric)
means <- stats_rel[stats_rel$metric == 'mean', covariate]
medians <- stats_rel[stats_rel$metric == 'median', covariate]
ho_means <- stats %>% filter(label == 'beta' & metric == 'mean') %>% dplyr::select(covariate)
plots[[covariate]] <- compare_log_cont(df_training_full, covariate, means, medians, as.numeric(ho_means), print=FALSE)
}The density and violin plots show that there are significant differences between both groups (Beta and Release) concerning some user engagement metrics, listed as follows.
num_pagesnum_pages_maxactive_hoursactive_hours_maxuri_counturi_count_maxdaily_unique_domainsdaily_unique_domains_maxIn this application, we need to balance the Beta and Release datasets to resemble each other across the covariates we are concerned with, that is, the user engagement metrics. Balancing, in this case, yields a set of client_id for Beta that resembles Release. This gives us an idea of how these users do indeed change in time. If we see changes that are larger than anticipated, then we know that something significant is happening in user engagement that we can “forecast” in the subsequent Release.
First, we determine the number of training (v67) Beta and Release clients that are in the validation set (v68).
## label freq
## 1 beta 38861
## 2 release 38184
Let’s compare this to existing distribution:
## Percentage of beta mutual clients: 65 %
## Percentage of release mutual clients: 64 %
Hence, most training clients (65%) are in the validation set.
Subset the validation clients down to those matched:
## label freq
## 1 beta 14132
## 2 release 10467
stats_pre <- calc_delta(df_validate_1x, engagement)
stats_post <- calc_delta(df_validate_matched, engagement)
stats_mean <- stats_pre[1, ] %>%
rbind(stats_post[1, ]) %>%
set_rownames(c('pre-matching', 'post-matching'))
stats_median <- stats_pre[2, ] %>%
rbind(stats_post[2, ]) %>%
set_rownames(c('pre-matching', 'post-matching'))Mean
knitr::kable(stats_mean) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
scroll_box(width = "100%")| active_hours | active_hours_max | uri_count | uri_count_max | search_count | search_count_max | num_pages | num_pages_max | daily_max_tabs | daily_max_tabs_max | daily_unique_domains | daily_unique_domains_max | daily_tabs_opened | daily_tabs_opened_max | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| pre-matching | 0.0646358 | 0.1028066 | 0.0816237 | 0.1303186 | 0.0581926 | 0.1046028 | 0.0923660 | 0.0930137 | 0.4204475 | 0.3377663 | 0.0006247 | 0.0364702 | 0.1570896 | 0.1021035 |
| post-matching | 0.0947241 | 0.0926328 | 0.0725612 | 0.0879248 | 0.0372849 | 0.0350132 | 0.0596316 | 0.0604804 | 0.3614949 | 0.2775021 | 0.0162564 | 0.0114660 | 0.0277262 | 0.0180453 |
Median
knitr::kable(stats_median) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
scroll_box(width = "100%")| active_hours | active_hours_max | uri_count | uri_count_max | search_count | search_count_max | num_pages | num_pages_max | daily_max_tabs | daily_max_tabs_max | daily_unique_domains | daily_unique_domains_max | daily_tabs_opened | daily_tabs_opened_max | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| pre-matching | 0.1259557 | 0.175981 | 0.1785714 | 0.2437811 | 0.25 | 0.3333333 | 0.3768946 | 0.3679381 | 0.0855263 | 0 | 0.0454545 | 0.1333333 | 0.0555556 | 0.1176471 |
| post-matching | 0.1281585 | 0.127551 | 0.1351068 | 0.1603376 | 0.00 | 0.0000000 | 0.2484032 | 0.2429324 | 0.0526316 | 0 | 0.0316688 | 0.0769231 | 0.0847458 | 0.1052632 |
| metric | label | active_hours | active_hours_max | uri_count | uri_count_max | search_count | search_count_max | num_pages | num_pages_max | daily_max_tabs | daily_max_tabs_max | daily_unique_domains | daily_unique_domains_max | daily_tabs_opened | daily_tabs_opened_max |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mean | beta | 0.7988445 | 1.471189 | 146.33392 | 287.3003 | 2.324319 | 5.100206 | 15614.038 | 15779.79 | 9.019717 | 12.828446 | 5.148112 | 8.581990 | 20.03166 | 37.29553 |
| mean | beta - matched | 0.8617226 | 1.657703 | 161.98428 | 336.1304 | 2.510444 | 5.829748 | 20109.092 | 20310.03 | 8.297907 | 12.104231 | 5.411060 | 9.465749 | 18.80546 | 37.14110 |
| mean | release | 0.8540465 | 1.639768 | 159.33983 | 330.3512 | 2.467935 | 5.696027 | 17203.011 | 17398.05 | 6.349912 | 9.589452 | 5.144898 | 8.906823 | 17.31211 | 33.84032 |
| median | beta | 0.5027778 | 0.962500 | 80.50000 | 152.0000 | 0.750000 | 2.000000 | 3347.500 | 3513.00 | 4.125000 | 6.000000 | 3.500000 | 5.200000 | 8.50000 | 15.00000 |
| median | beta - matched | 0.5830440 | 1.187500 | 98.30952 | 199.0000 | 1.000000 | 3.000000 | 6628.833 | 6828.75 | 4.000000 | 6.000000 | 3.785005 | 6.000000 | 9.00000 | 17.00000 |
| median | release | 0.5752315 | 1.168056 | 98.00000 | 201.0000 | 1.000000 | 3.000000 | 5372.286 | 5558.00 | 3.800000 | 6.000000 | 3.666667 | 6.000000 | 9.00000 | 17.00000 |
plots <- list()
for (covariate in engagement) {
stats_rel <- stats %>% filter(label == 'release') %>% dplyr::select(covariate, metric)
means <- stats_rel[stats_rel$metric == 'mean', covariate]
medians <- stats_rel[stats_rel$metric == 'median', covariate]
ho_means <- stats %>% filter(label == 'beta' & metric == 'mean') %>% dplyr::select(covariate)
plots[[covariate]] <- compare_log_cont(df_validate_full, covariate, means, medians, as.numeric(ho_means), print=FALSE)
}Once again, we use the KS test to verify whether any significant difference between the average user engagement metrics in the Beta and Release groups, over several versions (v67 and v68). Reminder: smaller KS distance values indicate a better balance.
par(mfrow = c(4, 2))
df_beta <- df_validate_full %>% filter(label == 'beta - matched')
df_rel <- df_validate_full %>% filter(label == 'release')
output = data.frame()
for (i in engagement) {
# Training
x_t <- df_beta[,i]
y_t <- df_rel[,i]
rg_t <- range(x_t, y_t, na.rm=T)
ks <- ks.test(x_t, y_t)$statistic
output = rbind(output, data.frame(KS = ks))
}| KS | |
|---|---|
| active_hours | 0.0072743 |
| active_hours_max | 0.0092511 |
| uri_count | 0.0071471 |
| uri_count_max | 0.0092912 |
| search_count | 0.0086209 |
| search_count_max | 0.0089323 |
| num_pages | 0.0433436 |
| num_pages_max | 0.0432005 |
| daily_max_tabs | 0.0479976 |
| daily_max_tabs_max | 0.0436476 |
| daily_unique_domains | 0.0178247 |
| daily_unique_domains_max | 0.0216601 |
| daily_tabs_opened | 0.0179124 |
| daily_tabs_opened_max | 0.0189196 |
The following violin plots depicts distributions for the following subsets:
NOTE: Guiding lines have been added for the following:
plots <- list()
for (covariate in engagement) {
stats_rel <- stats %>% filter(label == 'release') %>% dplyr::select(covariate, metric)
means <- stats_rel[stats_rel$metric == 'mean', covariate]
medians <- stats_rel[stats_rel$metric == 'median', covariate]
ho_means <- stats %>% filter(label == 'beta' & metric == 'mean') %>% dplyr::select(covariate)
plots[[covariate]] <- compare_log_cont(df_validate_full, covariate, means, medians, as.numeric(ho_means), print=FALSE)
}Our main objective was to determine if the user engagement metrics changed in the newest Beta version concerning the previous Release version. The density and violin plots show that there are significant differences between both groups (Beta and Release) concerning some user engagement metrics, listed as follows.
daily_max_tabsdaily_max_tabs_maxnum_pagesnum_pages_maxdaily_unique_domains